Fix brpc client cannot reconnect to server probabilistically when the…#1774
Fix brpc client cannot reconnect to server probabilistically when the…#1774zhaodongzhi wants to merge 1 commit into
Conversation
cc241fc to
08feee5
Compare
| @@ -787,6 +787,7 @@ friend void DereferenceSocket(Socket*); | |||
| // Flag used to mark whether additional reference has been decreased | |||
| // by either `SetFailed' or `SetRecycle' | |||
| butil::atomic<bool> _recycle_flag; | |||
| } | ||
| return; | ||
| } | ||
| _recycle_mutex.unlock(); |
There was a problem hiding this comment.
可以用std::unique_lock包一下,这里就不用手动释放
|
解决方案这里采用增加mutex的方式其实和brpc整体很多无锁的设计有点不太搭。。。。最初也考虑过一个无锁的方式但是发现比较麻烦 可以看下是否还有更优雅的不使用mutex的解决方案? |
Address是hot path,加了影响性能。而ReleaseAddtionalReference和Revive相对没有那么hot (除非是短连接),加锁影响小一些。 还有一种解决方案是让recycle_flag有3个值:正常、Revive中、已释放。如果ReleaseAddtionalReference发现状态是“Revive中”就sched_yield一下,直到遇到其它两个状态再继续。因为这种情况发生概率很小,这么改对性能几乎不会有影响。 |
嗯嗯,我也是考虑到这两个函数调用频率比较低,所以最后还是选择了加锁的解决方式,相对来说会比较直观简单些 |
… server restarts
Problem
-------
Server restart will trigger Socket::SetFailed -> Socket check healthy
-> Socket::WaitAndReset -> Reconnect to server -> Socket::Revive
on the brpc client side
Socket::Revive
-> _versioned_ref.cas(vref, <id_ver, nref + 1>) // step 1
-> _recycle_flag = false // step 2
After step 1 is successfully executed, the Socket can be acquired by Socket::Address
and the gap between 1 and 2 will cause the ref count in _versioned_ref to not be
decreased in Socket::ReleaseAdditionalReference since _recycle_flag is true
Finally Socket::WaitAndReset can never wait for the expected ref count to mistakenly
think that there are still another requests holding socket references
and fall into an infinite loop
Example:
Session1(Thread1) Session2(Thread2)
Socket::Revive
T1: _versioned_ref.cas(vref, <id_ver, nref + 1>)
T2: Socker::Address success
T3: Server restart or network error happened
T4: Socket::SetFailed
T5: Socket::ReleaseAdditionalReference
// Now _recycle_flag is true T6: _recycle_flag.cas(false, true)
-> Failed: No call Socket::Dereference
T7: _recycle_flag = false
T8: Socket::WaitAndReset
-> nref always be three, not excepted two
-> dead loop. cannot reconnected to server
Solution
--------
Add _recycle_mutex to avoid this race condition
Socket::Revive and Socket::ReleaseAdditionalReference will only be called
when some exceptions occur in the socket, so this mutex will not cause
performance degradation
08feee5 to
8fae316
Compare
|
您好,请问这里是还有其他需要修改的地方吗? |
|
由于PR : #1817 已经解决这个问题,所以这个PR close了。 |
Issue #1773:Fix brpc client cannot reconnect to server probabilistically when the server restarts